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Abstract 

Recent research has shown that performance in signal processing tasks can often be significantly improved by using 
signal models based on sparse representations, where a signal is approximated using a small number of elements from 
a fixed dictionary. Unfortunately, inference in this model involves solving non-smooth optimization problems that are 
computationally expensive. While significant efforts have focused on developing digital algorithms specifically for 
this problem, these algorithms are inappropriate for many applications because of the time and power requirements 
necessary to solve large optimization problems. Based on recent work in computational neuroscience, we explore 
the potential advantages of continuous time dynamical systems for solving sparse approximation problems if they 
were implemented in analog VLSI. Specifically, in the simulated task of recovering synthetic and MRI data acquired 
via compressive sensing techniques, we show that these systems can potentially perform recovery at time scales of 
10-20/is, supporting datarates of 50-100 kHz (orders of magnitude faster that digital algorithms). Furthermore, we 
show analytically that a wide range of sparse approximation problems can be solved in the same basic architecture, 
including approximate P norms, modified norms, re-weighted 1^ and the block norm and classic Tikhonov 
regularization. 

Index Terms 

Sparse approximation, optimization, inverse problems, analog architectures, compressed sensing. 

I. Introduction 

MANY classical approaches to signal and image processing rely on applying linear filters to incoming data. 
This type of processing can be done so efficiently (especially with specialized DSP integrated circuits) that it 
is possible to build "real-time" systems for many applications. However, recent research has shown that performance 
can often be significantly improved by using nonlinear processing strategies. For example, when presented with 
imperfect data measurements (e.g., due to noise, blur, missing data, undersampling, etc.), a common approach is to 
formulate the problem as a regularized inverse problem. This strategy can be thought of in a Bayesian framework, 
where the algorithm searches for a signal that was the most likely cause for the measurements, taking into account 
a prior probability distribution (i.e., a model) on the signal. 

While such Bayesian approaches can improve performance in many signal and image processing tasks, these 
methods rely on solving non-linear optimization problems that are much more computationally expensive than 
classical linear filtering. For example, a common family of optimization programs used in this setting minimizes 
energy functions of the form 

min E = - \\x - ^a\\l + \C ia) , (1) 

a 2 

where x G M^^ is the observed measurement vector, a G is a vector representing an estimate of the signal 
(possibly through coefficients in a transform domain such as Fourier or wavelets), $ is a M x matrix representing 
a linear measurement and corruption process, C (•) is a cost function penalizing a based on its fit with the signal 
model, and A is a parameter denoting the relative tradeoff between the data fidelity term and the cost function. 
Solving this optimization program is equivalent to finding the maximum a posteriori (MAP) estimate of the original 
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signal under a Gaussian noise model, with the cost function corresponding to the log prior distribution on the signal. 
Basic signal models frequently assume independence among the elements of a, resulting in a cost function that 
separates into a sum of individual costs ^i.e., C (a) = C (ak)^. One common example is the P norm, defined 

as C{a) = \\a\\P = (Eiof)- 

Significant research activity over the last two decades has focused on signal models based on sparse representa- 
tions. In these models, the cost function C(-) is chosen to penalize signals depending on the number of non-zero 
elements (i.e., the size of the support set of a). Sparse representations have drawn significant interest because many 
natural and man-made signals can be approximated by just a few elements from an appropriately selected basis 
set |[2|. Because the program in ([T]l is actually a NP-hard problem when the cost function simply counts the number 
of non-zero coefficients |3 |, much of the recent research has focused either on developing heuristic (often greedy) 
approximate solutions [4], or providing performance guarantees for relaxed versions of the problem |5]. To date, 
the strongest theoretical guarantees involve solving the optimization problem in equation ([T]) when the cost function 
is the norm 

II I I 2 

mill - ||a3 — $a||2 + A ||a||-, , (2) 

a 2 

where \\a\\-^ = Xli^i This optimization program goes by many different names, including Basis Pursuit De- 
Noising (BPDN) in the signal processing community [6|. Surprisingly, in many cases of interest it can be shown 
that solving BPDN recovers the sparsest solution even through (|2]) is a tractable convex program Q. 

One example of the utility of BPDN is the recent work in compressed/compressive sensing (CS) ||8|-||T0l. In 
brief, the CS results give performance guarantees for inverse problems when the signals are highly undersampled 
(M <^ N) and the signal a is assumed to be sparse (having only S < M non-zeros). The main CS results 
essentially show that for certain matrices <I> (generally taken to be random), 5-sparse signals can be recovered (up 
to the noise level) by solving BPDN as long as M ~ O {S log{N/ S)). These results mean that in situations where 
measurements are costly, a signal can be undersampled during acquisition in exchange for using more computational 
resources to recover the signal at a later time. 

Despite the long history of optimization in the field of signal processing (see Mattingley & Boyd |11] for a 
detailed discussion), the recent advent of applications that utilize optimization directly to perform signal processing 
tasks (e.g., CS) highlights a specific need for online optimization solvers that can operate in real time or under 
power constraints. To mention two example applications that may specifically benefit from real-time or low-power 
BPDN solutions (respectively), CS techniques have been proposed for both medical imaging p2| and channel 
estimation for wireless communications fT3]. While we will focus on CS as an example application, sparsity-based 
models (and the corresponding optimization problems) arise in state-of-the-art solutions to problems in a variety 
of disciplines, including machine learning and computer vision |14|, as well as signal restoration (e.g., denoising, 
deblurring, superresolution, inpainting) [TS]. 

Given the importance of solving problems such as BPDN in state-of-the-art algorithms, recent research has 
focused on dramatically reducing the time it takes to solve this optimization program. Sparse approximation is 
particularly challenging because the cost function in Q, as well as many other cases of interest, is not a smooth 
function. Despite much recent progress in developing both fast general purpose convex optimization algorithms 1 1 1 1 
and specialized solvers for (|2]), these algorithms are unable to solve moderately- sized BPDN problems fast enough 
to operate in many real-time applications. In particular, most algorithms for solving BPDN have storage, time and 
power requirements that scale unfavorably with the signal size. 

Recent work in computational neuroscience has demonstrated a continuous-time dynamical system where the 
steady-state response is the solution to the program in ([T]), and the architecture of the system is designed to 
efficiently deal with sparsity-inducing cost functions. Because the dynamics of this system correspond to basic circuit 
primitives (e.g., leaky integration, simple thresholding, lateral inhibition, etc.), an analog VLSI implementation has 
the potential to be significantly faster and more power efficient than digital approaches fTF]. For example, such an 
implementation could enable applications where CS techniques are used to acquire signals very quickly and the 
signal is recovered virtually instantaneously and with minimal power, thereby eliminating the typical processing 
bottlenecks of optimization-based signal processing methods (e.g., signal recovery in CS). 

The main goal of this paper is to highlight the potential benefits and wide applicability of analog architectures 
for efficiently solving sparsity-based optimization programs. Specifically, this paper makes two main contributions. 
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First, we provide extensive simulation comparisons of analog systems and digital algorithms for solving BPDN in the 
context of CS recovery for synthetic and MRI data. These examples demonstrate that idealized analog architectures 
could potentially solve individual optimizations at time scales of of 10-20^s, supporting datarates of 50-100 kHz 
(orders of magnitude faster that digital algorithms). Second, we show that a number of other optimization problems 
arising in the signal processing and statistics communities can be solved using the same basic architecture, including 
approximate ff^ norms for < p < 1, modified norms, re-weighted and the block norm and classic 
Tikhonov regularization. 



II. Background and related work 



A. Dynamical systems for minimization 

As mentioned above, recent work in computational neuroscience has shown that dynamical systems can be 
constructed that provably solve the optimization programs in ([1]) and are efficient for solving the non-smooth 
problems of interest in sparse approximation. These systems, known as locally competitive algorithms (LCAs) | [T7| , 
are comprised of a network of analog nodes being driven by the signal to be approximated. Each node competes 
with neighboring nodes for a chance to represent the signal, and the steady-state response represents the solution 
to the optimization problem. The LCA is a specific type of Hopfield neural network, which have a long history 
of being used to solve optimization problems [18J . We note here that other types of network structures have also 
been proposed recently to approximately solve sparse approximation problems in other ways |19|, |20|. 

Specifically, the k^^ node of the LCA is associated with 0^, the k^^ column of Without loss of generality, we 
assume each column has unit norm. This node is described at a given time t by an internal state variable Uk{t). The 
coefficients a are related to the internal states u via an activation function a{t) - T\ {u{t)) that is parametrized 
by A. These activation functions are often taken to be a type of thresholding function. In the important special 
case when the cost function is separable, the output of each node k can be calculated independently of all other 
nodes by a pointwise activation function ak{t) = T\ {uk{t)). Individual nodes are leaky integrators driven by an 
input proportional to {cf)/., x), and competition between nodes occurs via lateral connections that allow highly active 
nodes to suppress nodes with less activity. The dynamics for node k are given by: 



Ukit) 



1 



N 



{x, (fij,) - Uk{t) - ^((^fc, (t)j)aj{t) 



(3) 



where r is the system time constant. In vector form, the dynamics for the whole network are given by: 

1 



u{t) = - [<^>^x - u{t) - - /) a{t)] 



(4) 



In 1 17 1 it was shown that for the energy surface E given in ([TJ with a separable cost function, the path induced by 



the LCA (using the outputs ak{t) as the optimization variable) ensures 



dE{t) 



dt 



< when the cost function satisfies: 



A 



dC (ak) 
dau 



Uk - ak = Uk - Tx (uk) 



1-1 



ak) 



ak- 



(5) 



The same arguments also extend to the more general case of non-separable cost functions, ensuring ^^,f^ < when 



AVaC(a) = u 



a 



u 



Tx{u) 



1-1 



(a) 



a. 



(6) 



Recent followup work |21] establishes stronger guarantees on the LCA, specifically showing that this system is 
globally convergent to the minimum of E (which may be a local minima if C (•) is not convex) and proving that 
the system converges exponentially fast with an analytically bounded convergence rate. 

The relationship in Q requires cost functions that are differentiable and activation functions that are invertible. 
However, the cost function for BPDN (the £^ norm) is non-smooth at the origin and the most effective sparsity- 
promoting activation functions will likely have non-invertible thresholding properties. In these cases, one can start 
with a smooth cost function that is a relaxed version of the desired cost and calculate the corresponding activation 
function. Taking the limit of the relaxation parameter in the activation function yields a formula for Ta (•) that can 
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be used to solve the desired problem. Specifically, in the appendix we use the log-barrier relaxation |22| to show 
that the LCA solves BPDN when the activation function is the well-known soft thresholding function: 



C{ak) = \ak\ <;=^ ak = Tx{uk) 
Similarly, the LCA can find a local minima to the non-convex optimization program that minimizes the "norm" 



\uk\ < A 

Uk - Asign('Ufc) \uk\> X 



of the coefficients (i.e., number of non-zeros) by using the hard thresholding activation function |17|: 

I \uk\ < A 



C(afc) = /(afc / 0) <^ ak = Tx{uk) 
where /(•) is the standard indicator function. 



Uk \uk\ > A 



B. Digital algorithms for sparse approximation 

Recent work has focused significant efforts on developing specialized algorithms for solving BPDN on digital 
platforms. Several interior point methods have been proposed in this area, including ^^-magic |23] and U-ls |24|. 



Alternatively, the GPSR algorithm |25| employs a gradient projection approach to solving the BPDN problem. 
Homotopy (or continuation) methods ||26|-|[28t take an entirely different approach, solving a series of optimization 
problems for a decreasing sequence of tradeoff parameters A and utilizing efficient updates to find these sequential 
solutions. To speed up the recovery process for very large signals, additional work has sought to leverage parallel 



hardware configurations such as multicore p9| and GPU architectures pO[ . Multicore processing makes use of the 
parallelalizable aspects of the algorithm to divide the total computational burden between the available processing 
units, incurring larger communication overhead for more processors. GPU-based algorithms mainly utilize the ability 
to perform matrix calculations substantially faster than standard processors. However, while achieving improvements 
in solution times, neither of these architectures provide favorable scaling properties and it is unclear if they would 
be able to provide real-time solutions for significantly sized problems. Also, neither architecture is appropriate for 
low-power embedded computing applications. 

Among digital algorithms, the family of iterative thresholding methods |[3T|-p5| is most similar to the LCA. 
These methods iteratively take gradient-type steps to minimize the cost function ([T]) and apply a thresholding function 
to enforce the sparsity constraints. A first-order discrete Euler approximation to the continuous-time LCA dynamics 
illustrates that the fundamental update of this analog system is basically the same as these digital algorithms, with 
the principal difference being that each step of the LCA has an incremental effect on the current solution (rather 
than taking a large step as in each iteration of the digital algorithm) |[T7|. Recently, approaches based on linearized 
Bregman iterations have also been shown to have update steps that have a similar form | [36} . 

in. Efficient analog BPDN solutions 
In this section, we demonstrate the performance of the analog LCA in simulated CS recovery problems to show 



the potential benefits of analog optimization architectures. In the first set of simulations (Sections |III-A and III-B i. 



we use synthetic stylized data to thoroughly explore the solution quality and solution times with (simulated) analog 



and digital approaches. In the second set of simulations (Section III-C I, we use MRI data to show performance on 
a large scale problem of practical importance. 

A. LCA solution quality 

To begin, we investigate the quality of simulated LCA solutions on CS recovery problems with synthetic data 
to verify that they are comparable to standard digital algorithms. While the LCA system is proven to converge 
asymptotically to the unique BPDN solution, the approximate solution achieved by any algorithm in finite time can 
have different characteristics depending on the particular solution path. In the general problem setup, the unknown 
signal ao G is 5-sparse and is observed through M < N Gaussian random projections, x = + u, where 
u is additive Gaussian noise. Following typical approaches in the CS community, we recover an estimate of oq 



by solving BPDN. We compare the simulated performance of the LCA with the interior-point method 11 -Is |24| 



and the gradient projection method GPSR |25|. This investigation will address two main questions. First, are the 
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solutions produced by the simulated LCA as accurate as the digital comparison cases? Second, what solution times 
are possible in the simulated LCA? While there may also be significant advantages in power consumption, this 
issue is implementation specific and beyond the scope of this work. 

The test CS problems can be parameterized by the number of observations M, the size of the original sparse 
signal N and the sparsity level S. We draw the nonzero coefficients of using a Gaussian distribution with variance 
1 and we draw the locations from a uniform distribution. The choice of regularization parameter A depends on 
the variance of the additive noise v which is not necessarily known a priori. We have empirically observed that 
A = .01||<I> a;||oo gives good performance in this task when the noise variance is 10"''. Additionally, we observe 
that as with many other algorithms, implementing a continuation method by gradually decreasing A (similar to that 
used in FPC [34]) also improves convergence time in the LCA. Specifically, we initialize A = ||$^a;||oo and allow 
a multiplicative decay of 0.9 at each iteration of the simulation until A reaches the desired value given above. To 
ensure that the comparison among the algorithms is fair, we use the same stopping criterion for convergence based 
on the duality gap upper bound proposed in |24|. 

To explore solution quality we display the results of solving the CS recovery optimizations using plots inspired 
by the phase plots described by Donoho & Tanner Q. We parameterize the plots using the indeterminacy of the 
system indexed by 5 = M /N, and the sparsity of the system with respect to the number of measurements indexed 
by p = S/M. We vary 5 and p in the range [.1, .9] using a 50 by 50 grid. For a given value {5, p) on the grid, we 
sample 10 different signals using the corresponding (M, N, S) and recover the signal using BPDN. We compare the 
results of the simulations by displaying in the top row of Figure [T] a phase plot for each algorithm, where the color 
code depicts the average relative MSE of the CS recovery for each algorithm (calculated by | |a — ao| I2 / | |ao| l2)- I^i 
a similar vein, the middle row of Figure [T] shows the energy function (i.e., the BPDN objective function) evaluated 
at the solution, 0.5 \\x — ^a\\2 + A \\a\\-^. 

The near identical plots for the two metrics above demonstrate that the LCA is indeed finding solutions 
of essentially the same quality as the comparison digital algorithms, both in terms of signal recovery of the 
compressively sensed signal, and in terms of the optimization objective function. When the LCA and digital 
solutions are compared directly, we find that the average difference in the solutions differs only by a relative mean- 
squared distance (calculated by ||aj;,c'^ — a£)/G'||2 / ||a_D7G|l2) °f 1-97- 10~^ when compared to Ills and 6.64- 10~^ 
when compared to GPSR. For comparison, the rMSE of the difference between the 11 -Is solutions and the GPSR 
solutions is 9.71 • 10"^, meaning that the LCA solutions have variability comparable to what the pair of comparison 
digital algorithms has between their solutions. We note that the solution differences are significantly larger between 
all of the algorithms in the regimes where CS recovery is difficult and poor solutions are found by all solvers, as 
demonstrated by the bottom row of plots in Figure [T] 



B. LCA convergence time 

To observe the potential solution times for the LCA, we compare the convergence of the LCA and GPSR on 
three specific signals in easy, medium and hard CS recovery problems with the same synthetic data as above 
(corresponding to different values of 5, p). Figure [2] shows the convergence of the relative MSE as a function of 
time for GPSR and the simulated LCA for three example signals. GPSR times are reported using measured CPlj[^ 
time, and LCA times aie reported using the number of simulated system time constants r. The simulation parameters 
used are identical to the previous simulations. While the solution paths have generally similar characteristics, the 
time scales are dramatically different. Focusing on the easy and medium CS problems that produce good recovery 
using minimization, GPSR is converging in times on the order of 0.3 seconds, whereas the LCA is converging 
in times on the order of ten time constants (lOr). We also note that while the results in Figure [2] are for individual 
signals for direct comparison with GPSR, the analysis of average case convergence for the LCA shown in Figure [3] 
and discussed below also support the same basic conclusions about the LCA convergence time. 

Though the time constant of an analog circuit depends on many factors (including the power consumption of 
the circuit), time constants on the order of 10~^ to 10~* are reasonable first-order estimates |37|. Even with the 
slowest of these time constants (r « 10"^) the analog solver is converging in approximately 10/iS of simulated 
time. This type of solution speed from the LCA is several orders of magnitude faster than GPSR and could support 



'Time is measured on a Dell Precision Desktop with dual Intel Xeon E5420 Processors and 14GB of DDR3 RAM. 
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Fig. 1. The solution quality of the LCA on a compressed sensing recovery task is comparable to the standard digital solvers GPSR and 
11-ls. The top row plots the relative MSB of the estimated signal for synthetic data, with indeterminacy of the system indexed by 5 = M/N, 
and the sparsity of the system with respect to the number of measurements indexed by p = S/M. The middle row plots the value of the 
BPDN objective function at the solutions. The bottom row plots the relative MSB in the solutions between the solvers, indicating the the 
differences in the LCA solutions are within the normal range of differences between the digital algorithms themselves. Note that all solvers 
demonstrate more variability in regions where the problems are more difficult and signal recovery cannot be performed well. 



solvers running in real time at rates of 100 kHz. We note that these times are on a similar order as the recent 
reports of small-scale implementations (especially when accounting for the interface between the analog circuit and 
the microcontroller hosting the circuit) | [38| . 

Finally, we also investigate the effect of problem size and problem difficulty {5, p) on the convergence speed 
of the LCA. For the same parameters corresponding to easy, medium and difficult CS recovery problems as used 
above, we sample 10 signals at three different problems sizes {N - 200, N - 500 and N - 1000) to perform CS 
recovery. Figure [s] displays the relative distance of the signal estimate a^*) from the true solution a as a function 
of simulated time, | |a(*) — a| [2 / | |a| jg. The plots are again shown as a function of the simulated time in terms of 
the number of system time constants r. As expected, convergence is faster and more reliable (i.e., less variance) 
for easier recovery problems (i.e., lower sparsity or more measurements). Interestingly, we note that increasing the 
signal size N does not appear to increase the solution time for the LCA. In a digital algorithm such as GPSR, 
while the number of iterations may not increase substantially, the solution time scales with N because the cost of 
each iteration (e.g., a matrix multiplication) increases significantly. In an analog system like the LCA, increasing 
the size of a matrix multiply requires increasing the circuit size and complexity. While this may increase the system 
time constant in some implementations [39|, it does not appear to require any more time constants for the system 
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Fig. 2. Temporal convergence of the LCA compared to GPSR. The plot shows the relative MSB of the signal recovery as a function of 
time for sample trials (A'^=1000) from the results in Figure [3] using GPSR (left) the simulated LCA (right). The convergence behavior is 
approximately the same, with harder problems taking both algorithms longer and decreasing the fidelity of the recovery. For the easy and 
medium difficulty problems where BPDN recovers the signal with good fidelity, GPSR takes 0.1-1 seconds to converge and the simulated 
LCA takes lO^r-lO'^r seconds to converge. For conservative values of r, the LCA solution times can still be as low as W/is, supporting 
datarates of up to 100 kHz 



LCA, N = 200 LCA, N = 500 LCA, N = 1 000 




Time (s) 

Fig. 3. Convergence behavior for the LCA for a number of different problem sizes (N,S,p). Each plot demonstrates the change in convergence 
based on easy, medium and hard CS recovery problems (i.e., 3 combinations of (S, p)) for A'^ = 200 (left), N = 500 (middle) and = 1000 
(right). While there is no appreciable increase in convergence time with increased problem size (larger N), similar to standard behavior with 
other optimization algorithms the LCA convergence time does increase with problem difficulty (smaller S and larger p). 



to settle on a solution0 



C. MRI Reconstruction 

The previous subsection demonstrated that for styUzed problems with synthetic data the LCA can achieve BPDN 
solutions and signal recoveries comparable to standard digital solvers. Furthermore the LCA appears to converge 
to solutions at speeds that would represent an improvement of several orders of magnitude over digital algorithms 
if implemented in an analog circuit. In this section we demonstrate the potential value of this system on a medical 
imaging application that could be significantly impacted by having real-time CS recovery techniques. Specifically, 
in this section we simulate the LCA recovery of undersampled MR images to evaluate the solution quality and 
speed. Compressive MRI is of particular interest because it allows shorter scan times, which improves both patient 

^Note that increasing the problem sizes does increase the time required to simulate the LCA, but not the amount of time being simulated. 
Also note that as we will discuss in the conclusions, there may be practical reasons that the system time constant r may increase with 
increasing problem sizes. 
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Actual LCA YALL1 




Fig. 4. Reconstruction of 256x192 pixel MRl images from simulated CS acquisition. The simulated LCA and the comparison digital 
algorithm (YALLl) find solutions of approximately the same quality in terms of relative MSB and image quality. YALLl finds the solution 
in approximately 10s, while the LCA finds the solution in approximately 20 time constants (2Q/is with conservative estimates of the time 
constant). 



throughput and lowers risk (e.g., shorter scan times mean that pediatric MRIs may be taken more often without 
general anesthesia |12]). Furthermore, compressive MR imaging combined with real-time image reconstruction 
would potentially allow new medical procedures to be performed using real-time, high-resolution 3-D imaging 
without using ionizing radiation. 

We simulate CS data acquisition on 21 frames of a dynamic cardiac MRI sequence]^ by subsampling the 
Fourier transform of each image (i.e., taking random columns of /c-space). Each image is 256x192 pixels, and 
we recover the images by solving BPDN to find sparse coefficients in a wavelet transform. Specifically, we solve 
the BPDN optimization program where the sensing matrix $ = FW^ is an inverse wavelet transform followed 
by a subsampled Fourier matrix, and recover the image by taking the wavelet transform of the solution to the 
BPDN problem. The choice of wavelet transforms in this case is very important, as transforms which are coherent 
with the Fourier subsampling scheme can result in poor results. We follow the work of |12] and use a 4 level 
2-dimensional Daubechies wavelet transform as the sparsifying basis. The resulting optimization is more difficult 
than the synthetic data in the previous two sections because the signals are larger and the images are sparse in a 
wavelet basis instead of the canonical basis. 

We compare results of recovery using the simulated LCA and another standard digital solver YALLl |34|. Figure |4] 
shows an example MRI image and its reconstruction using both the LCA and YALLl. The average relative MSE 
(using A = 0.001) over all 21 recovered images was 0.0109 for YALLl and 0.0106 for the simulated LCA. The 
relative differences between the LCA and YALLl solutions was 0.0042, indicating that the solution quality is 
essentially the same for both approaches. YALLl took approximately 10 second of computation time to reach this 
solution (on the same computer platform used in the previous simulations), while the LCA took approximately 20r 
simulated seconds. Again using time constant estimates of t = 10~^, this translates to solution times of 20/iS and 
datarates of approximately 50 kHz. 



IV. Alternate inference problems in the LCA architecture 



While Section III concentrated on exploring the performance of the LCA in solving the commonly used BPDN 
program, many other cost functions (i.e., signal models) fitting into the general form of ([T]l have been proposed 
in the signal processing and statistics literature to exploit sparsity in different ways. Using the basic relationships 
described in Q and ([6]), this section will present a variety of cost functions that can be optimized in the same basic 
LCA structure by analytically determining the corresponding activation function]^ These optimization programs 
include approximate norms, modified £P norms that attempt to achieve better statistical properties than BPDN, 
the group/block £^ norm that induces co-activation structure on the non-zero coefficients, re-weighted £^ and l"^ 
algorithms that represent hierarchical statistical models on the coefficients, and classic Tikhonov regularization. 



^The MRI data used was acquired using a GE I.5T TwinSpeed scanner (R12M4) using an 8 element cardiac coil. 

''We also note that a cost function might be easily implementable even in the absence of an analytic formula for the activation function 
simply by using numerical integration to find a solution and fitting the resulting curve. 
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Cost Function Threshold Function 




Fig. 5. Cost functions and tlieir corresponding thresholding functions. Left: The cost functions are compared for the (top) £^ with A = 0.5, 
scale invariant Bayes with A = 0.5, the Huber cost with A = 0.5 and e = 0.3 and (bottom) i" with A = 0.5, SCAD with A = 0.5 and a = 3.7 
and transformed £^ with thresh = 0.5 and j3 = 2. Right: The corresponding nonlinear activation function which can be used in the LCA to 
solve the regularized optimization program for each cost function. 



Before exploring specific cost functions, it is worthwhile to make a technical note regarding the optimization 
programs that are possible to implement in the LCA architecture. The strong theoretical convergence guarantees 
established for the LCA | |2T| apply to a wide variety of possible systems, but do impose some conditions on the 
permissible activation functions. We will rely on these same conditions to analytically determine the relationship 
between the cost and activation functions for the examples we consider in this section. Translated to conditions on 



the cost functions, the convergence results for the LCA | |21| require that the cost functions be positive yO (a) > 

symmetric (—a) = C (a)^, and satisfy the condition that the matrix ^AV^C (a) + is positive definite (i.e., 

Xd^C (ttk) /9a| + 1 > for separable cost functions). This last condition can intuitively be viewed as requiring 
that the activation function resulting from (|6]l has only a single output for a given input. In most cases we will 
only consider the behavior of the activation function for > because the behavior for < is implied by the 
symmetry condition. 

A. Approximate HP norms (0 < p < 2) 

When considering regularized least-squares problems of the form in ([T]|, perhaps the most widely used family of 
cost functions are the P' norms C (a) = ||a||p. These separable cost functions include ideal sparse approximation 
(i.e., counting non-zeros), BPDN, and Tikhonov Regularization f40'| as special cases (p = 0, 1 and 2, respectively), 
and are convex for p > 1. Furthermore, recent research has shown some benefits of using non-convex P norms 



(p < 1) for tasks such as CS recovery | |4T| , |42|. While the ideal activation functions can be determined exactly for 
the three special cases mentioned above (p = 0, 1 and 2), it is not possible to analytically determine the activation 
function for arbitrary values of < p < 2. Elad et al. ^^T\ recently introduced several parameterized approximations 
to the iP cost functions that are more amenable to analysis. In this section, we use these same approximations to 
determine activation functions for minimizing approximate ff' norms for < p < 2. 



1) Approximate for 1 < p < 2: For 1 < p < 2, Elad et al. |42| propose the following approximate cost 



function as a good match for the true norm for some value of parameters s and c: 

C(a) = ^ c|afc| - CS log ^1 + 
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Fig. 6. Approximate P cost functions and their corresponding thresholding functions. Left: The cost functions are approximated over the 
parameters c, s for values of p ranging from to 1 (top) and 1 to 2 (bottom). The true P costs are shown as dotted lines in the same colors. 
Using these values of c and s, a nonlinear activation function that can be used in the LCA to solve the optimization is plotted (right) using 
the thresholding equations for < p < 1 (top) and 1 < p < 2 (bottom). The thresholding functions clearly span the ranges between soft 
and hard thresholding for the lower range of p and between soft thresholding and linear amplification for the upper range of p. 



In the limiting cases, c = 1 with s — )• yields the norm and c = 2s with s — )• oo yields the £^ norm. Three 
intermediate examples for p - 1.25, 1.5 and 1.75 are shown in Figure |6] For any specific value of p, we find the 
best values of c and s by using standard numerical optimization techniques to minimize the squared error to the 
true cost function over the interval [0,2]. From this cost function, we can differentiate to obtain the relationship 
between each Uk and au as 

Uk = ak + A- 



s + ak 

We see from this relationship that with c = 1 and s — )■ 0, we obtain = Uk — \ for Uk > A (i.e., the soft- 
thresholding function for BPDN), while with c = 2s and s — )• oo we obtain = (i.e., a linear amplifier 
for Tikhonov Regularization). Solving for in terms of Uk (restricting the solution to be positive and increasing) 
yields a general relationship for the activation function 

T\ (uk) = ^ Uk - s - cX + ^/ {uk - s - cX) + AukS 

This solution is shown in Figure [6] for p - 1.25, 1.5 and 1.75 for A = 0.5. 

2) Approximate ff^ for {) <p <1: For < p < 1, Elad et al. [42] also propose the following approximate cost 
function as a good match for the true norm for some value of parameters s and c: 

C (ak) = cs log ^1 + ^ 

where the parameters c > and s > can be optimized as above to approximate different values of p. Three 
approximations for p - 0.5, 0.75 and 0.9 are shown in Figure [6] To determine the activation function, we again 
differentiate and find the appropriate relationship to be 

Acs 

Gk H ^ = Uk- 

S + Ofc 

Solving for ak reduces to solving a quadratic equation, which leads to two possible solutions. As above, we restrict 
the activation function to only include the solution that is positive and increasing, resulting in the activation function 

1 
2 



(uk) = t; {uk - s + \/ {uk + sf - 4Acs 



This activation function is only valid over the range where the output is a positive real number. If cA < s, this 
condition reduces to Uk > cA. More generally, this condition reduces to Uk > 2\/2csA — s. 
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B. Modified norms 

While the general £p norms have historically been very popular cost functions, many people have noted that this 
approach can have undesirable statistical properties in some instances (e.g., BPDN can result in biased estimates of 
large coefficients |43|). To address these issues, many researchers in signal processing and statistics have proposed 
modified cost functions that attempt to alleviate these statistical concerns. For example, hybrid norms smoothly 
morph between different norms to capture the most desirable characteristics over different regions. In this section 
we will demonstrate that many of these modified norms can also be implemented in the basic LCA architecture. 

1 ) Smoothly Clipped Absolute Deviations: A common goal for modified norms is to retain the continuity 
of the cost function near the origin demonstrated by the £^ norm, while using a constant cost function for larger 
coefficients (similar to the £^ norm) to avoid statistical biases. One approach to achieving these competing goals 
is the smoothly clipped absolute deviations (SCAD) penalty [ 44| , |45|. The SCAD approach directly concatenates 



the £^ and £^ norms with a quadratic transition region, resulting in the cost function given by 

< Cfc < A 




C (Ofc) = { (^}i\\ i"'kKX - ^ - ^) A < Ofc < kA , 



kX < ak 

for K> 1 (k defines the width of the transition region). An example of this cost function with A = 0.5 and k = 3.7 
is shown in Figure [5] 

To obtain the activation function we again solve A^^^ + Ofc = Uk for ttk as a function of n^. For SCAD (and 
all of the piecewise cost functions we consider), the activation function can be determined individually for each 
region, paying careful attention to the ranges of the inputs Uk and outputs ak to ensure consistency. For < Ofc < A, 
we have X + = u^, implying that = for < X and = — X over the interval X < < 2A. For 

A < Cfc < kA, we have 

(kA - ak) , (k - l)uk - kX 
X-, -VT- + ak = Uk =^ ak = 

[K — l)X K — 2 

over the interval 2X < Uk < nX. Finally, for kX < ak we have ak = Uk, giving the full activation function 

'0 Uk ^ ^ 

rj. r \ Uk- X X<Uk<2X 
- 1^ 2A < lifc < kA 
.Uk nX < Uk 

which is shown in Figure |5] for A = 0.5 and k = 3.7. Note that this activation function requires k> 2 (Antoniadis and 
Fan recommend a value of k = 3.7 |45]). While this is apparent from consistency arguments once the thresholding 
function has been derived, this restriction on k can also be deduced from the condition Xd^C {ak) /9a^ + 1 > 0. 

2) Transformed £^: Similar to the SCAD cost function, the transformed £^ cost ||45l, attempts to capture 
something close to the £^ norm for small coefficients while reducing the penalty on larger coefficients. Specifically, 
transformed £^ uses the fractional cost function given by 

for some /3 > 0. An example of this cost with /3 = 2 and A = 0.5 is shown in Figure [5] After calculating the 
derivative of the cost function, the activation function can be found by solving 

A/3 

for ak- Inverting this equation reduces to solving a cubic equation in a^. The three roots can be calculated 
analytically, but only one root generates a viable thresholding function by being both positive and increasing 
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for positive Uk- That root is given by 



X 1 /3 

This solution is viable only when is real valued, which corresponds to the range Uk > f^^j^^ — ^ 

Outside of this range, no viable non-zero solution exists and so - 0. The full thresholding function is shown in 
Figure |5] for A = 0.5 and 13-2. While it is interesting that an analytic form can be determined for this activation 
function, the expression is obviously very complex and would likely have to be approximated by curve fitting in 
any circuit implementation. 



3) Huber Function: The Huber cost function |47| aims to modify standard £^ optimization to improve the 
robustness to outliers. This cost function consists of a quadratic cost function on smaller values and a smooth 
transition to an cost on larger values, given by 



< |afc| < e 
- § e < \ak\ 

An example of the Huber cost is shown in Figure [5] for A = 0.5 and e = 0.3. As in the case of other piecewise cost 
functions, we calculate the activation function separately over each interval of interest by calculating the derivative 
of the cost function in each region. For the first interval, the relationship is given by ^ = Uk — ak, which obviously 
gives the activation function Tx (u^) = for < e + A. For the second interval, we have A|^ = — a^, 

which yields the activation function Tx {uk) = ^1 — for > e + A. Putting the pieces together, the full 

activation function (as expected) is a mixture of the Tikhonov regularization and the soft thresholding used for 
optimization given by 



Ofc = Tx (uk) 



m \uk\<e + X 



which is shown in Figure [5] for A = 0.5 and e = 0.3. We can see that as e — 0, the cost function converges to the 
£^ norm and the thresholding function correctly converges back to the soft-threshold function derived earlier using 
the log-barrier method. 

4) Amplitude Scale Invariant Bayes Estimation: A known problem with using the £^ norm as a cost function 
is that it is not scale invariant, meaning that the results can be poor if the amplitude of the input signals changes 
significantly (assuming a constant value of A). Many cost functions (including the ones presented above) are 
heuristically motivated, drawing on intuition and tradeoffs between the behavior of various norms. In contrast, 
Figueiredo and Nowak | ,48J approach the problem from the perspective of Bayesian inference with a Jeffreys' prior 
to determine a cost function with more invariance to amplitude scaling, similar to the non-negative Garrote ||49|. 
We consider here the cost function 



a? afcJa^ + 4A2 / , \ 

^H = E-ii + ^4A + Alog(a, + ^ai + 4AM 

k 



which is proportional to the one given by Figueiredo and Nowak |48| and is shown in Figure [5] for A = 0.5. 



Taking the derivative of this cost function, we end up with the relationship between Uk and 

2A 



u,-ak = -2X^ + -^al + 4X^. 
Solving for as a function of Uk yields the following activation function. 



a-k = Tx (uk) 



Uk < X 

X^)/uk Uk> X' 



[u 
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Fig. 7. The nonlinear activation function used in the LCA to optimize the non-overlapping group LASSO cost function has multiple inputs 
and multiple outputs. The plot shows an example thresholding function for both elements in a group of size two (A = 0.5), with each line 
illustrating the nonlinear elTect on ai while U2 is held constant. 



matching the results from Figueiredo and Nowak |48|. This activation function is shown in Figure [5] for A = 0.5. 



C. Block 

While all cost functions discussed earlier in this section have been separable, there is increasing interest in the 
signal processing community in non-separable cost functions that capture structure (i.e., statistical dependencies) 
between the non-zero coefficients. Perhaps the most widely cited cost function discussed in this regard is the block 
£^ norm (also called the group £^ norm), which assumes that the coefficients representing x are active in known 
groups. In this framework, the coefficients are divided into blocks, Ai C {a^} and each block of coefficients Ai 
is represented as a vector a'. For our purposes, we assume the blocks are non-overlapping but may have different 
cardinalities. The block norm |50| is defined as the £^ norm over the norms of the groups, 

C{a) = ^\\a' 
I 

essentially encouraging sparsity between the blocks (i.e., requiring only a few groups to be active) with no individual 
penalty on the coefficient values within a block. Because this cost is not separable, the activation function will no 
longer be a pointwise nonlinearity and will instead have multiple inputs and multiple outputs. 

Following the same general approach as above, we calculate the gradient of the cost function for each block, 

V„..C(a)= 



a' ^ y-^J II ; I 
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yielding the following relationship between the activation function inputs and outputs 

J 



u'' = a^ + X——. (7) 



a 



,|a'ii2 

J 



While directly solving this relationship for a appears difficult, we note that we can simplify the equation by 
expressing ||a'||2 in terms of Hw'H^. To see this, take the norm of both sides of ([7]) to get llw'112 = ||a'||2 + A. 
Substituting back into Q, the relationship simplifies to 



Ta (uM = a' = ( 1 



A 



\u'-\ 



over the range < Ija^H^ = 11^^112 — A, implying A < ll^'Hg- This relationship yields the block- wise thresholding 



14 



function 

This activation function can be thought of as a type of shrinkage operation applied to an entire group of coefficients, 
with a threshold that depends on the norm of the group inputs. For the case of groups of two elements (with A = 0.5), 
Figure [7] shows the nonUnearities for each of the two states as a function of the value of the other state. 

D. Re-weighted £^ and £^ 

Recent work has also demonstrated that re-weighted £p norms can achieve better sparsity by iteratively solving 



a series of tractable convex programs |51 1-|54|. For example, re- weighted | |53| is an iterative algorithm where a 
single iteration consists of solving a weighted minimization (a) = Afc|afc|^ , followed by a weight update 
according to the rule 

Afe oc - — ^ — , (8) 

\ak\ + 7 

where 7 is a small parameter. By having approximately equal to the inverse of the £^ norm of the coefficient 
from the previous iteration, this algorithm is more aggressive than BPDN at driving small coefficients to zero and 
increasing sparsity in the solutions. Similarly, re-weighted i"^ algorithms | |51| have also been used to approximate 
different p-norms with weights updated as 

Such schemes have shown many empirical benefits over norm minimization, and recent work on re-weighted £^ 



has established theoretical performance guarantees |55| and interpretations as Bayesian inference in a probabihstic 



model |54|. 



One of the main drawbacks to re-weighted algorithms is the time required for solving the weighted program 
multiple times. Because we have established earlier that the LCA architecture can solve the norm optimizations 
(and weighted norms are a straightforward extension to those results), it would immediately follow that a dynamical 
system could be used to perform the optimization necessary for each iteration of the algorithm. While this would 
be a viable strategy (and would save significant time compared to digital solvers, as evidenced by the results in 



Section III-B 1, we show here that even more advantages can be gained by performing the entire re-weighted £^ 
algorithm in the context of a dynamical system. Specifically, we consider here a modified version of the LCA where 
an additional set of dynamics are placed on A in order to simultaneously optimize the coefficients and coefficient 
weights in an analog system. While the ideas here are expandable to the general re-weighted case, we focus on 
results involving the re-weighted £^ as presented in [54J. 
The modified LCA is given by the system equations: 

Tutiit) = <^^x - u{t) - ($^^> - /) a{t) 
a{t) = Tx {u{t)) 
TxXkit) = A^i(t) - u-^ {\ak{t)\ + 7) 

At steady state, A = which shows that A^ {00) abides by ([8]) with v representing the proportionality constant. 
While the complete analysis of this expanded analog system is beyond the scope of this paper, we show in Figure [8^ 
simulations which demonstrate that this system reaches a solution of comparable quality to digital iterative methods. 
Figure [8^ plots the relative MSE from a CS recovery problem with length- 1000 vectors from 500 noisy measurements 
with varying levels of sparsity. We sweep the parameter p = S/M from zero to one and set the noise variance to 
10~^, with each plot representing the relative MSE averaged over 15 randomly chosen signals. Figure [sj a) plots 
the recovery quality for three systems: iterative re-weighted (using GPSR to solve the £^ iterations), iterative 
re-weighted £^ (using the LCA to solve the £^ iterations), and dynamic re-weighted £^ which uses the modified LCA 
described above. It is clear that the three systems are achieving nearly the same quality in their signal recovery. 
Figure [8b plots the convergence of the recovery as a function of time (in terms of system time constants r) for 
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the iterative and dynamic re-weighted approaches using the LCA. The dynamically re-weighted system clearly 
converges more quickly, achieving its final solution in approximately the time it takes to perform two iterations of 
the traditional re-weighting scheme using the standard LCA. 




0.2 0.4 0.6 0.8 1 100r 300r 500T 700T 900T 

P Time (s) 

(a) (b) 



Fig. 8. Re-weighted optimization in digital algorithms and in a modified LCA. (a) Re-weighted optimization for a signal with 
N — 1000 and S = 0.5, with p swept from to 1. The traditional iterative re-weighting scheme is performed with both a standard digital 
algorithm (GPSR) and the LCA. For comparison, a dynamic re-weighting scheme where the LCA is modified to have continuous dynamics 
on the regularization parameter (rather than discrete iterations) is also shown. Each method is clearly achieving similar solutions, (b) The 
temporal evolution of the recovery relative MSB for a problem with A'^ — 1000, S = 0.6 and p = 0.45. Solutions are shown for the amount 
of simulated time (in terms of number of time constants). The dynamically re- weighted system converges in approximately the time it takes 
to use the LCA to solve two iterations of the traditional re-weighted (.^ algorithm. 



V. Conclusions and future work 

Sparsity-based signal models have played a central role in many state-of-the-art signal processing algorithms. The 
resulting shift toward optimization as a fundamental computational tool in the signal processing toolbox has made 
it difficult to implement many of these algorithms in applications with significant power constraints or real-time 
processing requirements. The main contributions of this paper have been to illustrate the potential advantages of 
using an analog dynamical system to perform sparse approximation in an analog integrated circuit. Specifically, 
our simulations have demonstrated that the idealized LCA could solve problems of significant size on time scales 
of approximately 10-20/iS, corresponding to real-time solvers at rates approaching 50-100 kHz. Interestingly, and 
in stark contrast to using digital algorithms on the same problems, the solution times in the idealized LCA do not 
appear to scale significantly with the problem size. Beyond the minimization problem that is most commonly 
referenced in the literature, we have also demonstrated that the same network structure can implement a wide variety 
of other cost functions from the signal processing and statistics literature that are related to sparse approximation. 

From these results we conclude that solving sparse approximation problems via analog dynamical systems could 
have a significant impact on a wide range of applications and certainly warrants further investigation. In the case 
of CS, the typical mantra has been that CS techniques can help when measurements are expensive and the user 
is willing to trade reduced measurement burdens for increased computational complexity during signal recovery. 
The potential performance of an implementation of the LCA could remove the current bottleneck of CS recovery, 
making CS techniques applicable in an even wider variety of applications. With the increased interest in using 
signal models that incorporate more information than simple signal sparsity (e.g., 'structured sparsity' models) 



for improved CS performance |56|, an interesting avenue for future study would be to develop efficient dynamical 
systems for performing inference in models with more complex structure than the group £^ norm already established 
in this paper. 
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The design and implementation of analog circuits has traditionally been difficult, and it is not immediately 
clear that the potential benefits of the idealized LCA illustrated in this paper could be achieved in an actual 
implementation. As mentioned earlier, the development of reconfigurable analog chips \57j have improved many of 
the issues related to barriers in the design phase of analog integrated circuits. In fact, the reconfigurable platform 
described in | |57[ has been used to implement a small version of the LCA for solving BPDN p9[ . The preliminary 
tests of this LCA implementation are on the same order as the simulated solution speeds shown in the present 
work. 

Implementing a system such as the LCA at a scale large enough to be useful in applications will present additional 
issues that must be addressed in future work. In particular, the mismatch between elements inherent in the fabrication 
process and the scaling of the time constant due to factors such as increased load capacitance present challenges 
that could reduce the effectiveness of the idealized system. In addition to large scale implementations, interesting 
future work would include establishing bounds on the solution errors in terms of fabrication mismatch, exploring 
system designs that exhibit the least potential for time constant increases as the system scales and determining the 
viability of hybrid analog-digital systems that achieve the benefits of both modalities. We note here that the initial 
prototype implementation in | ,39J reported a system with solutions achieving relative MSB of less than 5%. 
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Fig. 9. Log barrier relaxations of BPDN. (a) Tiie cost function approacfies tlie ideal (.^ norm as the relaxation parameter is increased, 
(b) In a similar way, the nonlinear activation function derived for the LCA approaches the ideal soft-thresholding operator as the relaxation 
parameter is increased. 



Appendix 

Soft-threshold activation for BPDN using the log-barrier relaxation 

We will first rewrite the desired BPDN problem in equation ([2]) in an extended formulation to make the variables 
non-negative. Define a new M x 2N matrix through the concatenation operation <I> = [<I> —<!>]. Similarly define a 
vector z = [z+ Z-] of length 2N such that Zi>Q and a = Zj^ — Z-. Essentially z represents the original variables 
a by separating them into two subvectors depending on their sign. We can then write a constrained optimization 
program that is equivalent to BPDN: 

2N 



1 



mm 



X 



^z 



k=l 



s.t. Zk > 0. 



(9) 



This reformulation is a standard way to show that £^ cost penalties are equivalent to a linear function in a constrained 
optimization program. One can then apply the standard log-barrier relaxation to convert the program in ([9]) to an 
approximately equivalent unconstrained program: 



1 

mill - 
z 2 



X 



^z 



fc=l ^ '^^ k=l 



(10) 



As 7 — oo, this program approaches the desired program (|9]). This relaxation strategy underlies an interior point 



algorithm (called the barrier method) for solving convex optimization programs, where ( |T0| ) is repeatedly solved 
with increasing values of 7 [22]. 

Note that the relaxed problem in (10) fits the form of the general optimization program stated in ([T]) with the 
differentiable cost function C (zk) = z^ — ^-^^^j^- For a fixed value of 7, this cost function can be differentiated and 
used in the relationship given in Q to solve for z^ in terms of to find the corresponding invertible activation 
function: 

1 ( / 4 + 7(A-^fc)2 \ 
Zk = Tx [uk) = 2 I V - ""fcj I • 



Finally it is straightforward to show that in the relaxation limit (7 — )• 00) where the program in ([TO]) approaches 
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BPDN, the desired activation function becomes the soft-thresholding function: 




4 + 7(A-'Ufc)2 A 1/ /7T y2 f. \\ 1^ when Ufe < A 

7 I Z \ /lufc — A when Uk > X 



To illustrate the convergence of this relaxation to the desired £^ cost function and the corresponding soft-threshold 
activation function, Figure |9] plots C (•) and Tx (•) in this relaxed problem for several values of 7. Note that in the 
extended formulation of BPDN given in (|9]l, the variables occur in pairs where where only one of them can be 
nonzero at a time. Because the activation function is zero for all state values with magnitude less than threshold, 
it is possible to represent each of these pairs of variables in one LCA node that can take on positive and negative 
values and where the activation function is a two-sided soft-thresholding function (thereby reducing the number of 
nodes back down to A^). 



